Experiments with Decision Tree Classifiers – Discretization of Numerical Attributes
نویسندگان
چکیده
Classification algorithms are used in numerous applications everyday, from assigning letter grades to student student’s scores, to computerized letter recognition in mail processing. Discretization consists of applying a set of rules to reduce the number of discrete intervals from which an attribute is assigned. Discretization is generally applied to datasets whose numerical range consists of continuous values, replacing the numerical domain with a nominal domain. This paper’s primary focus is on the application of discretization algorithms on Classification and Regression Trees (CART) using Equal Width, Equal Frequency and CAIM discretization algorithms. Wile no method dominated others for all datasets, results show that discretization prior to classification generally decreases the processing time, while increasing the accuracy of the classifier. The data presented here further substantiates this claim. Index Terms — Supervised discretization, CART, classification, class-attribute interdependency, continuous attributes, machine learning algorithm.
منابع مشابه
Action Rules Discovery Based on Tree Classifiers and Meta-actions
Action rules describe possible transitions of objects from one state to another with respect to a distinguished attribute. Early research on action rule discovery usually required the extraction of classification rules before constructing any action rule. Newest algorithms discover action rules directly from a decision system. To our knowledge, all these algorithms assume that all attributes ar...
متن کاملCMP: A Fast Decision Tree Classifier Using Multivariate Predictions
Most decision tree classifiers are designed to keep class histograms for single attributes, and to select a particular attribute for the next split using said histograms. In this paper, we propose a technique where, by keeping histograms on attribute pairs, we achieve (i) a significant speed-up over traditional classifiers based on single attribute splitting, and (ii) the ability of building cl...
متن کاملA greedy algorithm for supervised discretization
We present a greedy algorithm for supervised discretization using a metric defined on the space of partitions of a set of objects. This proposed technique is useful for preparing the data for classifiers that require nominal attributes. Experimental work on decision trees and naïve Bayes classifiers confirm the efficacy of the proposed algorithm.
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملCLOUDS: A Decision Tree Classifier for Large Datasets
Classification for very large datasets has many practical applications in data mining. Techniques such as discretization and dataset sampling can be used to scale up decision tree classifiers to large datasets. Unfortunately, both of these techniques can cause a significant loss in accuracy. We present a novel decision tree classifier called CLOUDS, which samples the splitting points for numeri...
متن کامل